1 GITHUB UPDATE & SUBMISSION INSTRUCTIONS (Delete before submission!)

For this and subsequent assignments you’ll follow the same basic github steps you used for Assignment 1, ie:

2 Prepare Transaction Data and Explore

We begin by loading our prepared AAVE transaction data into a dataframe. The dataset has over 400,000 rows, and 27 columns.

We are directly loading the dataframe from an Rds archive instead of a CSV file to conserve space.

#load Rds (binary version of csv file) into dataframe
# Assumes this notebook is in: ~/IDEA-Blockchain/DefiResearch/StudentNotebooks/Assignment02
df<-read_rds('../../Data/transactions.Rds')

# Let's take a quick look at the first few observation
head(df)

Now look at the summaries to see the types, values, and missingness (NA’s) of the data.

summary(df)
##      amount            borrowRate       borrowRateMode     onBehalfOf       
##  Min.   :        0   Min.   :    0.0           :386542   Min.   :2.578e+33  
##  1st Qu.:       24   1st Qu.:    3.3   Stable  : 18408   1st Qu.:4.174e+47  
##  Median :     1427   Median :    3.9   Variable: 76569   Median :7.522e+47  
##  Mean   :   191103   Mean   :    9.5                     Mean   :7.592e+47  
##  3rd Qu.:    24382   3rd Qu.:   10.8                     3rd Qu.:1.168e+48  
##  Max.   :600000000   Max.   :10002.0                     Max.   :1.461e+48  
##  NA's   :7289        NA's   :386542                      NA's   :7289       
##       pool              reserve         timestamp              user          
##  Min.   :9.862e+47   USDC   :105937   Min.   :1.607e+09   Min.   :2.578e+33  
##  1st Qu.:1.035e+48   WETH   :105279   1st Qu.:1.615e+09   1st Qu.:4.199e+47  
##  Median :1.035e+48   USDT   : 58266   Median :1.621e+09   Median :8.697e+47  
##  Mean   :1.034e+48   DAI    : 55211   Mean   :1.620e+09   Mean   :8.082e+47  
##  3rd Qu.:1.035e+48   LINK   : 26404   3rd Qu.:1.624e+09   3rd Qu.:1.173e+48  
##  Max.   :1.035e+48   WBTC   : 26344   Max.   :1.629e+09   Max.   :1.461e+48  
##                      (Other):104078                                          
##           type        reservePriceETH     reservePriceUSD    
##  borrow     : 94977   Min.   :1.000e+00   Min.   :0.000e+00  
##  deposit    :192006   1st Qu.:2.865e+14   1st Qu.:1.000e+00  
##  liquidation:  6289   Median :4.652e+14   Median :1.000e+00  
##  redeem     :126705   Mean   :3.458e+23   Mean   :6.774e+08  
##  repay      : 60542   3rd Qu.:9.411e+14   3rd Qu.:1.000e+00  
##  swap       :  1000   Max.   :1.647e+28   Max.   :4.252e+13  
##                       NA's   :7289        NA's   :7289       
##    amountUSD         collateralAmount  collateralReserve principalAmount  
##  Min.   :        0   Min.   :      0          :475230    Min.   :      0  
##  1st Qu.:       70   1st Qu.:      1   WETH   :  2665    1st Qu.:    962  
##  Median :     5836   Median :     14   LINK   :  1312    Median :   4362  
##  Mean   :   245851   Mean   :   5451   WBTC   :   686    Mean   :  66005  
##  3rd Qu.:    49871   3rd Qu.:    250   AAVE   :   333    3rd Qu.:  21533  
##  Max.   :754379487   Max.   :4638724   UNI    :   230    Max.   :4475668  
##  NA's   :7289        NA's   :475230    (Other):  1063    NA's   :475230   
##  principalReserve reservePriceETHPrincipal reservePriceUSDPrincipal
##         :475230   Min.   :1.000e+00        Min.   :    0.0         
##  USDC   :  2142   1st Qu.:4.062e+14        1st Qu.:    1.0         
##  USDT   :  1549   Median :4.682e+14        Median :    1.0         
##  DAI    :  1459   Mean   :1.556e+17        Mean   :  295.6         
##  GUSD   :   242   3rd Qu.:5.363e+14        3rd Qu.:    1.0         
##  TUSD   :   175   Max.   :4.203e+19        Max.   :83819.1         
##  (Other):   722   NA's   :475230           NA's   :475230          
##  reservePriceETHCollateral reservePriceUSDCollateral amountUSDPincipal
##  Min.   :1.000e+00         Min.   :0.000e+00         Min.   :      0  
##  1st Qu.:1.000e+00         1st Qu.:0.000e+00         1st Qu.:   1022  
##  Median :5.110e+14         Median :1.000e+00         Median :   4481  
##  Mean   :2.177e+21         Mean   :4.543e+06         Mean   :  67361  
##  3rd Qu.:1.110e+16         3rd Qu.:2.600e+01         3rd Qu.:  22066  
##  Max.   :9.116e+23         Max.   :2.509e+09         Max.   :4571839  
##  NA's   :475230            NA's   :475230            NA's   :475230   
##  amountUSDCollateral borrowRateModeFrom borrowRateModeTo  stableBorrowRate
##  Min.   :      0             :480519            :480519   Min.   :  0.0   
##  1st Qu.:      0     Stable  :   471    Stable  :   529   1st Qu.:  9.0   
##  Median :    476     Variable:   529    Variable:   471   Median : 10.9   
##  Mean   :  37060                                          Mean   : 11.7   
##  3rd Qu.:   7457                                          3rd Qu.: 12.0   
##  Max.   :5029023                                          Max.   :154.7   
##  NA's   :475230                                           NA's   :480519  
##  variableBorrowRate
##  Min.   :  0.0     
##  1st Qu.:  3.8     
##  Median :  3.9     
##  Mean   :  5.7     
##  3rd Qu.:  5.1     
##  Max.   :148.7     
##  NA's   :480519

First we’ll do some preliminary analysis before we ask detailed questions.

2.1 Analyze Transaction Types

Let’s examine the different types of transactions present in the data. We’ll make a simple bar plot to visualize the number of each transaction types. “Deposit” is the most common type of transaction, whereas “swaps” are the most rare.

#set color palette
colors = brewer.pal(6,"Set2")

#create barplot
barplot(table(df$type), main='Transaction Type Counts', xlab='Type',ylab='Count',col=colors)

There are more “deposits” than “borrows,” because users often need to overcollateralize for loans.

Now we’ll examine the amount of US dollars being used in the different types of transactions. We create box plots for the four types of transactions that have the “amount” feature associated with them, and we visualize the distribution of that column for the different transactions.

We can see that most transactions are completed with very little money.

#create boxplot
boxplot(amountUSD~type,data=df,outline=FALSE,col=colors,
        main="Transaction Amounts",xlab="Type",ylab="Amount (USD)")

We do find some very large amounts, so it’s helpful to look at this on a log scale.

boxplot(log(amountUSD)~type,data=df,outline=FALSE,col=colors,
        main="Log Transaction Amounts",xlab="Type",ylab="Log Amount (USD)")

Observation: There are many borrows and repays with high transactions amounts, but deposits and redeems have much lower transactions amounts.

2.2 Examine Reserve Coins

There are 50 different “Reserve” coins used in transactions in AAVE. Let’s create a table of those reserve coins with at least 500 transactions and rank order them by their volume.

# Use deplyr to drop NA reserves, add the counts and then keep only the top 20
reservecoins <- df %>%  drop_na(reserve) %>% 
  count(reserve) %>% 
  arrange(-n) %>% 
  head(22)

# Add the rank to help keep track of the reserve coins
reservecoins <- reservecoins %>% 
  mutate(rank=1:nrow(reservecoins),.before=reserve)

# List the results nicely with kable()
kable(reservecoins)
reserve n rank .before
USDC 105937 1 USDC
WETH 105279 2 WETH
USDT 58266 3 USDT
DAI 55211 4 DAI
LINK 26404 5 LINK
WBTC 26344 6 WBTC
AAVE 12174 7 AAVE
CRV 10593 8 CRV
UNI 7547 9 UNI
XSUSHI 7337 10 XSUSHI
SNX 6938 11 SNX
SUSD 6542 12 SUSD
6289 13
GUSD 6009 14 GUSD
YFI 5919 15 YFI
BUSD 4863 16 BUSD
TUSD 3317 17 TUSD
BAL 3152 18 BAL
MKR 3101 19 MKR
REN 2638 20 REN
ENJ 2457 21 ENJ
MANA 1993 22 MANA

Let’s look at the number of transactions types for each currency.

TopcoinSummary <- df %>% filter(reserve %in% reservecoins$reserve) %>%
  group_by(reserve) %>% 
  count(type) %>% 
  mutate(percent = n/sum(n)*100)

kable(TopcoinSummary) 
reserve type n percent
liquidation 6289 100.0000000
AAVE borrow 2 0.0164285
AAVE deposit 7028 57.7295876
AAVE redeem 5141 42.2293412
AAVE repay 3 0.0246427
BAL borrow 215 6.8210660
BAL deposit 2171 68.8769036
BAL redeem 612 19.4162437
BAL repay 154 4.8857868
BUSD borrow 1685 34.6493934
BUSD deposit 1135 23.3395024
BUSD redeem 836 17.1910343
BUSD repay 1207 24.8200699
CRV borrow 1054 9.9499670
CRV deposit 5780 54.5643349
CRV redeem 2607 24.6105919
CRV repay 1152 10.8751062
DAI borrow 14133 25.5981598
DAI deposit 18552 33.6019996
DAI redeem 13381 24.2361124
DAI repay 8895 16.1109199
DAI swap 250 0.4528083
ENJ borrow 234 9.5238095
ENJ deposit 1302 52.9914530
ENJ redeem 681 27.7167277
ENJ repay 239 9.7273097
ENJ swap 1 0.0407000
GUSD borrow 2282 37.9763688
GUSD deposit 1493 24.8460642
GUSD redeem 967 16.0925279
GUSD repay 1267 21.0850391
LINK borrow 1321 5.0030298
LINK deposit 15270 57.8321466
LINK redeem 8713 32.9987881
LINK repay 1097 4.1546735
LINK swap 3 0.0113619
MANA borrow 220 11.0386352
MANA deposit 1018 51.0787757
MANA redeem 563 28.2488710
MANA repay 192 9.6337180
MKR borrow 188 6.0625605
MKR deposit 1766 56.9493712
MKR redeem 986 31.7961948
MKR repay 159 5.1273783
MKR swap 2 0.0644953
REN borrow 196 7.4298711
REN deposit 1417 53.7149356
REN redeem 840 31.8423048
REN repay 183 6.9370735
REN swap 2 0.0758150
SNX borrow 433 6.2409916
SNX deposit 4002 57.6823292
SNX redeem 2052 29.5762468
SNX repay 451 6.5004324
SUSD borrow 1277 19.5200245
SUSD deposit 2403 36.7318863
SUSD redeem 1781 27.2240905
SUSD repay 1081 16.5239988
TUSD borrow 991 29.8763943
TUSD deposit 853 25.7160084
TUSD redeem 661 19.9276455
TUSD repay 796 23.9975882
TUSD swap 16 0.4823636
UNI borrow 567 7.5129190
UNI deposit 3912 51.8351663
UNI redeem 2540 33.6557573
UNI repay 527 6.9829071
UNI swap 1 0.0132503
USDC borrow 35469 33.4812200
USDC deposit 27586 26.0400049
USDC redeem 22131 20.8907181
USDC repay 20326 19.1868752
USDC swap 425 0.4011818
USDT borrow 22332 38.3276697
USDT deposit 12593 21.6129475
USDT redeem 10349 17.7616449
USDT repay 12719 21.8291971
USDT swap 273 0.4685408
WBTC borrow 2082 7.9031278
WBTC deposit 13994 53.1202551
WBTC redeem 8442 32.0452475
WBTC repay 1816 6.8934103
WBTC swap 10 0.0379593
WETH borrow 7234 6.8712659
WETH deposit 56373 53.5462913
WETH redeem 35505 33.7246744
WETH repay 6155 5.8463701
WETH swap 12 0.0113983
XSUSHI borrow 242 3.2983508
XSUSHI deposit 4382 59.7246831
XSUSHI redeem 2454 33.4469129
XSUSHI repay 259 3.5300532
YFI borrow 403 6.8085825
YFI deposit 2976 50.2787633
YFI redeem 2146 36.2561243
YFI repay 394 6.6565298

2.3 Look at Sample User Transaction Histories

Finally, we will examine the transaction history of different users. To do this, we will select 3 random users from the data who have completed between 100 and 300 transactions. Then, we create swarmplots displaying the different types of transactions those users made over time.

#set seed
set.seed(1)

# Select three random users that have between 100 and 300 transactions
users<-vector(length=3)
count<-0
while(count<=3){
  success<-FALSE
  while(!success){
    #get random user
    ruser<-sample(df$user,1)
    
    #check for valid number of transactions
    length<-nrow(filter(df,user==ruser))
    if (length>100 && length<300){
      users[count]=ruser
      success<-TRUE
      count<-count+1
    }
  }
}
df.rusers<-filter(df, user %in%users)

# Create a "swarmplot"

ggplot(df.rusers,aes(user, timestamp,color=type)) + 
        geom_beeswarm(cex=1)+
        coord_flip()+
        ggtitle("User Transaction History")

Observation: Users have very different transactions patterns, which we will try to better understand.

3 Activities

##\(\color{blue}{Exercise~1} (10 pts)\)

  1. Divide the top 20 reserve coins between your team members so each team member has the same amount of coins; feel free to add or subtract coins if necessary.

  2. Look up your coins on the internet to find out what they are. ProTip: Look them up on http://defipulse.com to see their Total Value Locked (TVL).

  3. Examine the percentage of transaction types for your coins. Hypothesize why a given coin might have more of one type of transaction than another.

  4. Prepare one slide with the findings for each of your coins. (one slide summarizing each coin)

  5. Coordinate with your team, to combine each of your member’s coin descriptions into a single presentation. Please develop a common format or template for presenting your coin summaries so that the common information for a coin (e.g. TVL) is shown in the same format. (Your slide summaries should look the same!)

  6. Be prepared to present your team presentation to your client in class on Thursday 9/16.

usdt_df <- df[df$reserve == "USDT", ]
boxplot(amountUSD~type,data=usdt_df,outline=FALSE,col=colors,
        main="Transaction Amounts",xlab="Type",ylab="Amount (USD)")

TopcoinSummary[TopcoinSummary$reserve == 'USDT',]
TopcoinSummary[TopcoinSummary$reserve == 'WBTC',]
TopcoinSummary[TopcoinSummary$reserve == 'XSUSHI',]
wbtc_df <- df[df$reserve == "WBTC", ]
boxplot(amountUSD~type,data=wbtc_df,outline=FALSE,col=colors,
        main="Transaction Amounts",xlab="Type",ylab="Amount (USD)")

xsushi_df <- df[df$reserve == "XSUSHI", ]
boxplot(amountUSD~type,data=xsushi_df,outline=FALSE,col=colors,
        main="Transaction Amounts",xlab="Type",ylab="Amount (USD)")

##\(\color{blue}{Exercise~2} (10 pts)\)

  1. Perform a “creative exploration” of some aspect of the Defi data that you find interesting. Add your work in this notebook. Your work should include at least one data visualization.

  2. Put your work in this notebook.

  3. Write a paragraph describing your findings in the context of DeFi.

  4. Be prepared to share (2 minutes) in team meeting.

  5. You’ll receive extra credit for work that goes “above and beyond” this assignment!

# Use above code to generate random users onBehalfOf

#set seed
set.seed(10)

# Select three random users that have between 100 and 300 transactions
users<-vector(length=3)
count<-0
while(count<=3){
  success<-FALSE
  while(!success){
    #get random user
    ruser<-sample(df$onBehalfOf,1)
    
    #check for valid number of transactions
    length<-nrow(filter(df,onBehalfOf==ruser))
    if (length>100 && length<300){
      users[count]=ruser
      success<-TRUE
      count<-count+1
    }
  }
}
randUsers<-filter(df, onBehalfOf %in%users)
randUsers <- randUsers[order(randUsers$onBehalfOf),]
head(randUsers)
#Network Graph of 1 random individual and the reserves they made transactions with
person <- unique(randUsers$onBehalfOf)[1]
person_df <- randUsers[randUsers$onBehalfOf == person,]
f <- c(as.character(person_df$reserve[1]))
t <- c()
for (i in 2:(nrow(person_df)-1)) {
  f[i] = as.character(person_df$reserve[i])
  t[i-1] = as.character(person_df$reserve[i])
}
t[i] = as.character(person_df$reserve[nrow(person_df)])

data <- tibble(
  from = f,
  to = t
)

#Plot
p <- simpleNetwork(data, height="100px", width="100px",        
        Source = 1,
        Target = 2,
        linkDistance = 10,
        charge = -900,               
        fontSize = 14,
        fontFamily = "serif",
        zoom = T)
p
#Network Graph of 3 random individuals and the reserves they made transactions with
f <- c(as.character(randUsers$reserve[1]))
t <- c()
for (i in 2:(nrow(randUsers)-1)) {
  f[i] = as.character(randUsers$reserve[i])
  t[i-1] = as.character(randUsers$reserve[i])
}
t[i] = as.character(randUsers$reserve[nrow(randUsers)])

data <- tibble(
  from = f,
  to = t
)

# Plot
p <- simpleNetwork(data, height="100px", width="100px",        
        Source = 1,
        Target = 2,
        linkDistance = 10,
        charge = -900,               
        fontSize = 14,
        fontFamily = "serif",
        zoom = T
        )
p
# Attempt to create Plotly line chart with transaction history for an individual user
# filtered by coin, I messed around with it for many hours but couldn't quite get it to work

df_sort <- df.rusers[order(df.rusers$timestamp),]
dropdown <- list()
num_coins <- length(unique(as.character(df_sort$reserve))) 
for (i in 1:num_coins) {
  coin <- unique(as.character(df_sort$reserve))[i]
  time <- df_sort[df_sort$reserve == coin,]$timestamp
  y__ <- df_sort[df_sort$reserve == coin,]$amount
  data <- data.frame(time, y__)
  scatter_fig <- plot_ly(data, x = ~time, type = "scatter", mode = "lines+markers")
  if (i == 1) {
    scatter_fig <- scatter_fig %>% add_lines(y = ~y__, color = ~df_sort[df_sort$reserve == coin,]$type, name = ~df_sort[df_sort$reserve == coin,]$type)
  } else {
    scatter_fig <- scatter_fig %>% add_lines(y = ~y__, color = ~df_sort[df_sort$reserve == coin,]$type, name = ~df_sort[df_sort$reserve == coin,]$type)
  }
  if (i == 1) {
    dropdown <- c(dropdown, list(list(method = "update", args = list("visible", c(list(T), rep(F, each = num_coins - 1))), label = coin)))
  } else {
   dropdown <- c(dropdown, list(list(method = "update", args = list("visible", c(rep(F, each = i - 1), list(T), rep(F, each = num_coins - i))), label = coin)))
  }
}

scatter_fig <- scatter_fig %>% layout(
  title = "Transactions By Coin",
  updatemenus = list(
    list(
      y = 0.9,
      buttons = dropdown 
    )
  )
)

scatter_fig
# This is what the previous plot is supposed to look like without the filtering

df_sort <- df.rusers[order(df.rusers$timestamp),]

time <- df_sort[df_sort$reserve == "WBTC",]$timestamp
amount <- df_sort[df_sort$reserve == "WBTC",]$amount

data <- data.frame(time, amount)

fig <- plot_ly(data, x = ~time, y = ~amount, type = 'scatter', color = ~df_sort[df_sort$reserve == "WBTC",]$type, name = ~df_sort[df_sort$reserve == "WBTC",]$type, mode = 'lines+markers') %>%
        layout(title = "WBTC")

fig

Findings

My first set of plots are network graphs using the NetworkD3 package. The purpose of these plots is to encompass an individual’s transaction history by specific coin. I wanted to visualize the volume of transactions for each coin as well as a general sense for order of how these transaction occurred. From the first plot, you can see the individual spent a lot time with the cluster on the right (SUSD, CRV, SNX, DAI, USDC) and then gradually moved over to some coins on the left side of the network as time went on. The goal of this was to visualize what types of coins were related based on the time frame the user interacted with them. The other network plot deals with three random individuals instead of one.

My next set of plots were line charts using Plotly. This was again based on an individual’s transactions with a coin. I wanted to see the order of different types of transactions as well an amount associated with each one. My goal was to make it filterable so that we could see all the coins the individual has ever interacted with but I could not get the filtering to work properly. Looking at the second line chart, we can see the individual made many deposits and redeems in WBTC and spiked in activity towards the middle of their time interacting with the coin. It would be interesting to also equate these transactions with the value of the coin.